Comments for MEDB 5502, Week 04

Topics to be covered

  • What you will learn
    • Indicator variables for three or more categories
    • Multiple factor analysis of variance
    • Checking assumptions of analysis of variance
    • Interactions in analysis of variance
    • Interactions in analysis of covariance
    • Interactions in multiple linear regression
    • Unbalanced data

Review oneway analysis of variance

  • \(H_0:\ \mu_1=\mu_2=...=\mu_k\)
  • \(H_1:\ \mu_i \ne \mu_j\) for some i, j
    • Reject \(H_0\) if F-ratio is large
  • Note: when k=2, use analysis of variance or t-test

Full moon data

  • Admission rates to mental health clinic before, during, and after full moon.
  • One year of data

Boxplot of full moon data

Descriptive statistics

Analysis of variance table

Tukey post hoc

Creating indicator variables

Running general linear model with all indicator variables

Analysis of variance table with first and second indicators

Irrelevant rows removed

Parameter estimates, 1 of 3

  • 11.458 - 13.417 = -1.959
  • 10.917 - 13.417 = -2.5

Parameter estimates, 2 of 3

  • 11.458 - 10.917 = 0.541
  • 13.417 - 10.917 = 2.5

Parameter estimates, 3 of 3

  • 10.917 - 11.458 = -0.541
  • 13.417 - 11.458 = 1.959
  • \(\ \)
  • Reference category, the category associated with the indicator variable left out of the model.

Using moon as a fixed factor

Removing the unneeded rows

Parameter estimates using Moon as a fixed factor

Live demo, Multiple factor analysis of variance

Break #1

  • What you have learned
    • Indicator variables for three or more categories
  • What’s coming next
    • Multiple factor analysis of variance

Mathematical model

  • \(Y_{ijk} = \mu + \alpha_i + \beta_j +\epsilon_{ijk}\)
    • i=1,…,a levels of the first categorical variable
    • j=1,…,b levels of the second categorical variable
    • k=1,…,n replicates with first and second categories

\(\ \)

  • \(H_0:\ \alpha_i=0\) for all i

\(\ \)

  • \(H_0:\ \beta_j=0\) for all j

Crosstabulation of categorical predictors

Analysis of variance table for moon data

Removing irrelevant rows

Parameter estimates for the full moon model

Tukey post hoc test

Live demo, Multiple factor analysis of variance

Break #2

  • What you have learned
    • Multiple factor analysis of variance
  • What’s coming next
    • Checking assumptions of analysis of variance

Assumptions

  • Normality
  • Equal variances
  • Independence
  • Note: No linearity assumption
    • Only for linear regression and analysis of covariance

Q-Q plot of residuals

Residual versus predicted value plot

Live demo, Checking assumptions of analysis of variance

Break #3

  • What you have learned
    • Checking assumptions of analysis of variance
  • What’s coming next
    • Interactions in analysis of variance

What is an interaction

  • Impact of one variable is influenced by a second variable
  • Example, influence of alcohol on sleeping pills
  • Three types of interactions
    • Between two categorical predictors
    • Between a categorical and a continuous predictor
    • Between two continuous predictors
  • Interactions greatly complicate interpretation

Interaction plot

  • X axis, first categorical variable
  • Separate lines for second categorical variable
  • Y axis, average outcome

Hypothetical interaction plots, 1 of 4

  • No interaction
  • Ineffective treatment
  • Boys/girls similar

  • No interaction
  • Ineffective treatment
  • Boys fare better than girls

Hypothetical interaction plots, 2 of 4

  • No interaction
  • Effective treatment
  • Boys/girls similar

  • No interaction
  • Effective treatment
  • Boys fare better than girls

Hypothetical interaction plots, 3 of 4

  • Significant interaction
  • Harmful treatment in boys
  • Effective treatment in girls

  • Significant interaction
  • Ineffective treatment in boys
  • Effective treatment in girls

Hypothetical interaction plots, 4 of 4

  • Significant interaction
  • Girls fare better overall
  • Effective treatment
  • Much more effective in boys

Data dictionary for exercise data, 1 of 3

data_dictionary: exercise.sas7bdat
description: |
  This dataset is used in a tutorial about interactions. A description from the original source: The dataset consists of data describing the amount of weight loss achieved by 900 participants in a year-long study of 3 different exercise programs, a jogging program, a swimming program, and a reading program which serves as a control activity. Researchers were interested in how the weekly number of hours subjects chose to exercise predicted weight loss.

Data dictionary for exercise data, 2 of 3

loss:
  label: Average weekly weight loss
  note: negative scores denote weight gain
  scale: real
  unit: unknown, presumably pounds
hours:
  label: weekly average amount of exercise
  unit: hours
effort:
  label: weekly effort scores
  note: self report
  scale: non-negative integer
  range: 0 through 50
  direction: 50 denoting maximum physical effort

Data dictionary for exercise data, 3 of 3

prog:
  label: exercise program
  note: reading is a control
  values:
    Jogging: 1
    Swimming: 2
    Reading: 3
female:
  scale: binary categorical
  values:
    male: 0
    female: 1
satisfied:
  label: satisfied with weight lost
  scale: binary categorical
  values:
    Dissatisfied: 0
    Satisfied: 1

Box plots of exercise data

Mean values for the interaction

Analysis of variance table for interaction model

Parameter estimates for the interaction model

Interaction plot, 1 of 2

Interaction plot, 2 of 2

Live demo, Interactions in analysis of variance

Break #4

  • What you have learned
    • Interactions in analysis of variance
  • What’s coming next
    • Interactions in analysis of covariance

A second type of interaction

  • Interactions in analysis of covariance
    • Between categorical predictor and continuous predictor
    • Different slopes within each category

Interaction between exercise program and hours spent exercising

Testing for interaction in analysis of covariance

Table with irrelevant rows removed

Parameter estimates

  • Intercept for prog=1, -8.997 + 2.216 = -6.781
  • Intercept for prog=2, 9.993 + 2.216 = 12.209
  • Intercept for prog=3, 2.216
  • Slope for prog=1, 10.409 + -2.956 = 7.453
  • Slope for prog=2, 9.83 + -2.956 = 6.874
  • Slope for prog=3, -2.956

Live demo, Interactions in analysis of covariance

Break #5

  • What you have learned
    • Interactions in analysis of covariance
  • What’s coming next
    • Interactions in multiple linear regression

Interaction between hours and effort

  effort: 
    label: weekly effort scores
    note: self report
    scale: 0 through 50
    direction:
      0 denoting minimal physical effort and
      50 denoting maximum effort.

Analysis of variance table

Table of means

Centered analysis

Weight loss at various conditions

  • hours = 2, effort = 30, predict 10.005
  • hours = 4, effort = 30, predict 10.005 + 2.291*2
  • hours = 2, effort = 40, predict 10.005 + 0.707*10
  • hours = 4, effort = 40, predict 10.005 + 2.291*2 + 0.707*10 + 0.393*2*20

Live demo, Interactions in multiple linear regression

Break #6

  • What you have learned
    • Interactions in multiple linear regression
  • What’s coming next
    • Unbalanced data

FEV data

Descriptive Abstract: Sample of 654 youths, aged 3 to 19, in the area of East Boston during middle to late 1970’s. Interest concerns the relationship between smoking and FEV. Since the study is necessarily observational, statistical adjustment via regression models clarifies the relationship.

  • fev continuous measure (liters)
  • sex discrete/nominal (Female coded 0, Male coded 1)
  • smoke discrete/nominal (Nonsmoker coded 0, Smoker coded 1)
    • Source: https://jse.amstat.org/datasets/fev.txt

Line plots of means for unbalanced data

Table of means

Table of frequencies and column percentages

Live demo, Unbalanced data

Summary

  • What you have learned
    • Indicator variables for three or more categories
    • Multiple factor analysis of variance
    • Checking assumptions of analysis of variance
    • Interactions in analysis of variance
    • Interactions in analysis of covariance
    • Interactions in multiple linear regression
    • Unbalanced data

Additional topics??